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ABSTRACT 


Given  an  input-output  sequence  of  syntactic 
translations  of  sentences  generated  by  a  deter¬ 
ministic  finite  state  grammar  G  into  Z* ,  a 
method  is  given  for  discovering  the  function  which 
maps  productions  of  G  into  I*  that  gives  rise 
to  the  observed  translation. 


1.  INTRODUCTION 

Let  G  *  (V^,  V y ,  P,  S)  be  a  right  linear  grammar  [2].  Thus 
all  productions  in  P  are  of  the  form 

A  -*■  aB  or  A  -»  a 

where  A  and  B  are  syntactic  variables  in  V^,  and  a  is  a 
terminal  (or  word)  in  V^.  We  shall  assume  that  G  is  determini sti c, 
by  which  we  mean  that  for  every  pair  (A,  a)  c  x  there  is 

at  most  one  production  in  P  of  the  above  form.  We  denote  the 
set  of  sentences  generated  by  G  by  L(G). 

With  G  we  shall  associate  what  we  shall  call  the  wi ri nq 
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There  is  obviously  a  natural  correspondence  between  the  elements 
of  L(G)  and  the  set  of  walks  from  S  to  F  in  G;  i.e., 

Xl  %2  x 

L(G)  *  {Xj...xnjS  — ►  X^ ,  — ►  *2  *  ‘  *  *  ,Xn-l  — ”  p  are  Tat3el  arcs 

of  G,  for  some  X^,...,Xn_i  e  V^}.  We  shall  assume  throughout 
this  paper  that  for  each  A  e  VN  in  G  there  is  a  path  from  S 
to  F  that  passes  through  A. 


Defini tion.  Given  a  deterministic  right  linear  grammar  G  and  a 
finite  abstract  set  of  symbols  $  *  {4^  ,  a  syntactic  trans- 

1  ati on  is  a  map  f  from  A(G)  to  <5*. 
a 

If  A  — *■  8  is  a  labelled  arc  of  G  and  if  the  image  of  this  arc 
under  f  is  $  where  <j>  s  $*,  then  graphically  we  write 

a  | 

A  - -  B 


($*  is  the  set  of  finite  length  sequences  from  4,  including  A, 
the  empty  string). 

This  definition  is  basically  equivalent  to  the  definition  of 
a  generalized  sequential  machine  (gsm)  [1],  where  f  is  called  an 
output  function. 

By  extending  the  definition  of  f  in  the  natural  way  we  have 

fex:  L ( S)  -*•  4*  ; 
i.e.,  i f  we  have  under  f 


al  I  4 


(1) 


aJ  + 


( n ) 


— ►  A  A 

rtl .  n  -1 

with  <|>^  , . . .  ,4^  e  4*.  then  the  sentence 


fex  (1)  (2) 

ala2  "  *  an  - -  4V  V  J 


(n) 


i 
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2.  TREE  COMPOSITIONS 

Def  i  ni  ti  on .  Let  E  be  a  finite  alphabet,  and  x  e  E*.  A 

1/ 

k-composltion  of  x  is  defined  to  be  an  ordered  k-tuple  c  e  (E*)  , 
c  *  (Cj, ... .  ,ck)  having  the  property  that  cic2  ...  ck  =  x.  The 
set  of  k-composi tions  of  x  is  denoted  Ck(x). 

2 

For  example,  if  E  =  {a,b,c},  then  C^( ab  c)  is  the  set 
2  2 

{(A,A,ab  c),  (A,a,b  ,c),  (A,ab,bc)  ...}  where  A  denotes  the 
empty  word.  In  general,  |Ck(x)|  =  =  (n+n_1)  if  |x|  «  n. 

The  notion  of  composition  is  extended  to  trees. 

Definition.  Let  E  be  a  finite  alphabet,  T  a  rooted  directed 
tree  T  =»  (N ( T ) ,  A ( T ) } .  Thus  T  is  a  directed  tree  with  a  dis¬ 
tinguished  node  R  e  N(T),  and  for  each  node  N  e  N(T)  there  is  a 
unique  directed  path  from  R  to  N.  The  leaves  of  T,  denoted 
L (T )  c  N(T)  -  R  are  the  nodes  of  T  with  degree  1.  Assume  the 
elements  of  L(T)  are  ordered  . L^  where  l  =  |L(T)|.  For 

a 

a  given  element  x  =  (x^,...,x^)  e  (E*)  a  T-composition  of  x 
is  defined  by  a  function 

tc 

A(T)  ~  E* 

having  the  property  that  for  each  leaf  L.  of  T,  and  unique  path 

w 

aj,...,ak  e  A(T)  from  R  to  L ^ , 

tc(a1)tc(a2)  ...  tc(ak)  =  x. 

Thus  a  tree  composition  reduces  to  a  k-composi ti on  when  the 
tree  is  a  rooted  path  consisting  of  k  connected  arcs.  An  example 
of  a  tree  composition  of  (ab,ab,b,ba)  is  shown  in  Figure  3,  for 
the  complete  binary  tree  with  7  nodes.  Given  T,  along  with  an 
ordering  for  the  leaves,  and  x  e  (E*)*"^  we  denote  the  set  of 


all  tree  compositions  of  x  by  TC(T,x). 


4)  (5)  (6)  (7J 


x  *  (ab,ab,b,ba) 


TC(T,x)  = 


A  /  \A 


ab/  \ab  b /  \>< 


b/  \b  b  /  \ba  ab/  \ab  A, 


5  (6)  f7)  (4)  (5)  (6)  (7)  f4)  (5) 


a/  \b 


ab/  \A 


ab/  \b 


b/  \b  A/  \a  a/  \a  b  /  \ba  a/  \a  A/ 

50  ©  (?)  Vj)  W  ©  (&)  ©  (a)  ©  © 


Fiqure  3, 


An  element  of  TC(T,x)  can  be  represented  as  a  non-negative 
integer  lattice  point  in  a  natural  way: 

If  a^  ...»  a  j  ^  ^  y  j  |  is  some  ordering  of  the  arcs,  then 


tc(a) 


tc(a) | 


a  e  A  ( T ) 


A(T)| 


specifies  a  lattice  point  in  L  =  IN 1  '  ;  1  ,  IN  *  non-negative 
i ntege  rs . 
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We  denote  by  S[TC(T,x)]  the  set  of  lattice  points  defined 
above.  A  partial  order  <  ^  is  defined  in  L  :  for  s,t  e  L 

s  <  Tt  iff  t  is  obtained  from  s 

by  moving  objects  up  the  tree. 

For  example, 


We  define,  for  SC  l ,  max  S  *  the  elements  of  S  having  the  pro¬ 
perty  that  for  no  teS:  s  <  t  ,  t^s. 

3.  THE  INDUCTION  PROBLEM 

It  is  possible  for  two  distinct  syntactic  translations  to  be 
extended  to  the  same  syntactic  map.  Thus  we  define  an  equivalence 
relation,  “  ,  on  5(G,$*)  by  defining  f^  -  iff  fj  and  f ^ 
are  extended  to  the  same  element  of  Sex  (L(G)  ,$*)  . 

The  induction  problem  for  syntactic  translations  is  this: 
an  observer  0,  who  we  assume  knows  the  internal  structure  of  the 
wiring  diagram  G  except  for  the  syntactic  translation,  can  ob¬ 
serve  sentences  from  L(G)  along  with  their  image  in  $*  under 
the  unknown  syntactic  translation.  Thus  he  can  observe  the  syn¬ 
tactic  map  for  a  few  sentences  in  1(G).  0  wishes  to  discover  an 

element  f  e  S( G,$*)  (up  to  equivalence)  such  that  fex  holds. 


We  assume  0  can  pick  the  sentences  he  wishes  to  observe.  The 
theorem  that  follows  shows,  essentially,  that  0  can  pick  a  finite 


number  of  sentences  from  L(G)  from  which  syntactic  translation 
discovery  is  possible. 

THEOREM :  The  syntactic  translation  (up  to  equivalence)  can  be  dis¬ 
covered  by  observing  a  finite  number  of  sentences  W. 


Remark :  What  the  theorem  says  is  that  on  observing  a  finite  set 
W  (to  be  constructed  below),  0  is  presented  with  a  finite  number 
of  word  equations : 

a l l a i 2  •••  aii.  =  *(i) 

•  X 

(E) 

a k 1 a k 2  akik  =  $(k) 


where  |W|  =  k,  amn  e  A(G)  (the  arc  set  of  G)  and  <J> 


(j) 


the 


observed  image  in  corresponding  to  the  sentence  determined  by 

the  walk  a^  ...  a  ^ .  in  G.  A  solution  of  E  (that  is,  an 
assignment  of  values  in  to  the  arcs  A(G)  so  that  E  is 

satisfied)  will  solve  the  induction  problem. 


Proof :  The  proof  follows  the  construction  of  the  implicit  functions 
in  [4]. 


We  construct  a  spanning  tree  T  in  G,  rooted  at  S  and 
connecting  all  nodes  in  V^.  F  is  not  connected  to  the  spanning 
tree.  For  the  example  of  Figure  1,  a  spanning  tree  T  is  indicated 
by  darkened  lines. 

Label  the  arc  set  A(G)  in  such  a  way  that  A(T),  the  set  of 
arcs  in  the  spanning  tree  are  a^,...,at> 


From  $  and  A(T)  =  {a^,...,a^}  we  create  a  new  set  of  sym¬ 
bols.  In  general  let  X  be  a  finite  alphabet  {xj,...,xn}.  Then 
define  X°  to  be  the  group  freely  generated  by  the  symbols  of  X, 
with  A  the  identity  element.  Form  ($  u  A(T))°  . 

Begin  at  F  and  consider  all  arcs  a  entering  F.  Call  this 
set  A(F),  A(F)  f  <j> .  Take  an  element  a  in  A(F).  In  what  fol- 
lows  if  a  is  the  arc  A  — ►  F  then  a(a)  =  A,  u(a)  =  F.  Thus 

a(a)  e  Vw  and  thus  there  is  some  walk  w  =  a.  ,...,a.  ,  a  from  S 

H  1 1  1  j 

to  F  with  a-  ,...,a.j  e  A(T).  The  sentence  determined  by  the 

*1  J 

walk  w,  call  it  s,  is  mapped  to  <j>(s),  which  0  observes  and 
wri  tes 

a  =  aT1  ...  aT1*  e  (*UA(T)°)  . 

This  is  done  for  each  element  of  A(F). 

0  now  considers  the  arcs  of  A ( G )  -  (A(T)UA(F)).  Let  A^  = 
the  set  of  arcs  a  of  G  not  in  A(T)  such  that  the  number  of 
arcs  in  the  shortest  path  (a  walk  with  no  repeated  nodes)  from 
U3 ( a )  to  F  is  j  (i.e.,  A^  =  A(F)).  Suppose  0  has  computed 
the  equations  for  the  arcs  in  A^ . A^’^.  Let  a  e  A^  and 


let  a,b.,...,b.  be  a  shortest  path  from  w(a)  to  F.  Now 
•  J 

a(a)  e  hence 

...  3j  3  b.  ...  b  . » 

1  j  L  J 

a  walk  from  S  to  F,  a.  ,...,a.  e  A ( T) .  If  this  corresponds 


to  sentence  s  then  0  observes  <|>(s),  so  that 


-1  -1  .  h-l 

ai  . .  .  a^  d  o  i 

1  j  1 1  J 


b^  e  (tUA(T))°  by  using  the  equations 


for  b.,...,b.  from  previous  computations.  This  process  terminates 
■  J 

with  a  list  of  equations 
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(I)  • 


9q  -k 


where  g^,...,g  ^  are  elements  of  ($uA(T))°, 

For  example,  from  Figure  3  if  we  define  the  arcs 


spanning  tree 


Then 


ala3a4 


a2a5 


a  2  a  6  a  5 


a2a  7a4 


ala9a4 


=  ^  3^  1<®>3<^>2  ^  1*^5  *^4 
2 

=  4>5cJ>^p4>34>2<^4c^i<^4 

2 

=  <^5^4^3'J,2<^5<^2^4^1(®)4 

=  ^5 ^4^ 3 $2 ^4^2 ^  1  ^5 ^4 
2 

-  ^  3  ^  1  ^  3  ^  1  ^  2  ^  1  ^  5  ^  4 


2  2 
ala3a8a2a5  ~  ^ 3*^  1  ^ 3^  1  ^ 2 ^ 3^2 ^5 ^ 4^ 3^ 2 ^ 4^  1  ^ 4  ’ 

These  equations  can  be  solved  in  the  group  ($UA(T))°  by  the 
method  indicated. 

It  follows  from  [4]  that,  given  (I),  the  syntactic  map  is  the 
same  for  all  assigments  of  a^,...,a^  to  elements  of  $° ,  and 
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hence  $*.  What  this  means  is  that,  given  the  finite  equations 
(I),  an  assignment  of  values  in  $*  to  the  arcs  of  the  spanning 
tree  a^,...^  so  that  a|<+i5‘*,,aq  as  de^inec*  by  U)  are  in  4* 
will  solve  the  induction  problem.  []  . 

A  sequence  a^.-.-.a^  e  3>*  such  that  ak+i»'**»aq  are  1n 
<$*  is  called  a  f e a s i b 1 e  point. 

4.  THE  INDUCTION  SOLUTION 

The  structure  of  equations  (I)  will  help  in  solving  the  word 
equations.  Instead  of  the  equation  a^+r  ~  9r  1 n  O)  let  us  con¬ 
sider  its  associated  equation  r  =  l,...,g-k 

=  a  i  ^  •  a  i  ^  ak+r  bl  *  *  *  bj 

as  determined  in  the  proof  of  Theorem  1.  Thus  a.  ...  a,  denotes 

1  J 

a  descent  down  the  spanning  tree  T,  ak+r  the  unknown  in  (I), 
bj  ...  bj  a  shortest  path  from  w(ak+r)  to  F. 

From  T  we  shall  construct  a  new  tree  T*  by  adding  leaves 
to  T  as  follows.  The  new  leaves  will  be  labelled  aj+^,...,a 
and  will  be  directed  respectively  to  the  nodes 

a(aj  +  l)  *  *  '  *  ,ct(aq) 

Thus  the  spanning  tree  T  of  Figure  1  becomes  T*  in  Figure  4. 

If  we  consider  TC(T',x)  where  x  e  ($>*)q"k  x  -  (4»(  ^  , .  .  .  >4,(q_k)) 

is  the  vector  of  observed  sentences  from  4*,  then  obviously  the  set 

of  feasible  points  a, . a,  are  in  TC(T',x)  ,  ,  ;  that  is, 

l  k  a l *  • *ak 

TC(T‘,x)  restricted  to  the  arcs  aj,...,ak.  In  some  examples  it 
turns  out  that  a  feasible  point  can  be  discovered  by  computing 
max  [TC(T‘  ,x)],  but  this  is  not  always  the  case.  Consider  Figures  5 
and  6 . 


m 


max  TC(T' ,x) 


al »a2‘a3 


Figure  6 . 


Figure  6  gives  max  TC(T' ,x)  .  .  ,  ,  a  feasible  point  (which 

a 1  *  a2  * a  3 

is  easily  verified). 

Figure  7  gives  an  example  of  a  case  where  max  TC(T‘ ,x) 
is  not  a  feasible  point. 

An  obvious  necessary  condition,  in  addition  to  the  feasible 

,  is 


a .  e  T 


points  being  in  TC(T',x) 

U(r)l 


ai 


vT1 


+  .  .  .  + 


1  j 


+  a 


k  +  r 1 


+  lb 


1 


+  ...  +  | b j 


Note  for  the  example  in  Figure  7,  if  we  let  |a^|  ■  x^  then 


x2  +  ^3  =  2 

X1  +  ^5  +  x3  *  3 

x2  +  x4  +  x5  +  x3  =  ^  * 


If 


3  and 


x2  *1.  as  we  have  in  the  max  TC(T',x) 


a  e  T 


solution 


then  there  is  no  (x,,x.,x,.)  non-negative  solution. 
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As  before,  we  denote  the  word  equation  for  the  variable 

ak+r  K+r  £  a(J’) 

ai x  *  ai£  ak  +  r  bl  ‘  "  bj  =  *( r)  * 

Let  us  now  assume  that  b^  ...  b^  (a  shortest  path  from  w(ak+r) 
to  F)  is  chosen  so  that  it  is  a  suffix  of  a  previously  defined 
walk. 

THEOREM :  A  sufficient  condition  for  an  assignment  of  arcs  a  s  T 
to  values  in  to  be  feasible  is  that  it  satisfies 

max  TC(  T'  ,<p) 


subject  to 


<*)  i«(j)i  -  *(j) 


:  - 

where 

W(j) 

i  s  the 

wal k  from 

S  to 

;>( 

i  abl  e 

ak+j 

• 

i-i 

f  t  ' 

\ih 

Proof: 

Let 

<i>(a) , 

a  e  A  ( G ) , 

be  the 

?  ' 

translation , 

so  for 

r  =  1 , . .  . 

,  g-k 

♦  (a^)  ...  $(a(  )$(ak+r)J(b[r*)  ...  i(bjr*)  . 

Let  <}>(a)  T  he  the  assignment  determined  by  the  criteria  stated 
in  the  theorem. 

We  claim  that  for  each  s  *  1,...,  I 

$(a.  )  ...  ( a i ^ )  <*>(ak+r)  ...  $(b.) 

is  a  suffix  of 

d*  ( a  ^  )  ...  4>(a,j^)  $  ( a  k  ^ )  ...  <(>  ( b  j )  . 

If  this  were  not  true,  then  we  would  have,  for  some  s  , 


4> ( a •  )  ...  d> ( a .  ) 


being  a  proper  prefix  of 
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4>(ai  )  ...  ^ < a i  ) 

U  ns-l 

and  this  contradicts  maximality. 

Consequently,  <j>(bj)  ...  4>(bj)  is  a  suffix  of  (by  in¬ 


duction,  b.  ...  b-  is  of  the  form  a! 

X  j  1 


ai t  ak+r  bi  •••  bj 


for  a  previously  computed  walk)  <j> ( a ^  )  ...  <j>(a.  )  is  a  prefix 

^  l 

of  <P(rj  ,  so  by  (*)  we  have  a  solution  in  $*  of  4>(ak  +  r).  [] 


The  example  of  Figure  7  shows  that 
'  x2  +  x3  =2 

■  xi  +  x5  +  x3  =3 

x2  +  x4  +  x5  +  x3  =  4 

->  (Xj.Xg)  e  {(0,0), (0,1), (1,0), (1,1), (1,2)}  . 
(x1,x2)  =  (1,1)  corresponds  to 


subject  to  (*) 


which  is  indeed  feasible. 

It  is  evident  that  we  may  replace  TC(T',<f>)  with  a  set  of 
inequalities,  i.e.,  for  the  example  in  Figure  7  we  must  have 

Xj  <  3 
x2  <  1  , 

for  the  example  in  Figure  5 


X1  S  ^ 

x2  <  4 
X1  +  x3  <  5 
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into  Z* ,  a  method  is  given  for  discovering  the  function  which 
maps  productions  of  G  into  t*  that  gives  rise  to  the  ob¬ 
served  translation. 
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